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The Present Jnvention and the Pending Clmms 

The pi^nt invention lelates generally to the field of speech recognition. More 
particularly, the invention discloses a technique for disambiguating speech input using 
one of voice mode interaction, visual mode interaction, or a combination of voice mod 
and visual mode interaction. 



selecting two or moie tokens generated without t ranslation of the iangUfiae 
in which the speech- audi^ ^ combination of speech and audio input js 
received, and nresentinp said tokens as alternatives to bo prooontod to the 
user 4 

presenting the alternatives to the user in one of voice mode, visual mode, 
or a combination of voice and visual mode, and receiving a selection of an 
alternative the user from the p luralitv of alternatives resented to 

the user in one of voice mode, visual mode, or a combination of voice 
mode and visual mode; and, 

communicating the selected alternative to the ^jplication as input to the 
application. 

Claim 12 (original). The method of claim 11, where the interaction comprises the 
concurrent use of said visual mode and said voice mode. 

Claim 13 (original). The method of claim 12, wherein the interaction comprises the user 
selecting fiom among the plural alternatives using a combination of speech and visual- 
based input. 

Claim 14 (original). The method of claim 1 1, wherein the interaction comprises the user 
selecting fix>m among the plural alternatives using visual input 
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E>'«^s<^t inve^tiott rel&t«>s geno^ly to the ficia of speech recoe^tion. More 
|3airticuWl>', the invention discloses a teehin^iue for disaxnt^ieuAtins sf>eeeh inpxAt xxsitif; 
oxie of voice niode interaction^ visxial mode interactioo. or a combination of voice mode 
-vis'ual mode interaction. 

Olaima 1. 4-5. V^. and 1 1-14 are cumsntly ^.endine. Recor^ideration and allowar^e of 
the penciins claims is respectfully reqixested. 

Swnr99*xry cJtf th^ <3tffi<»& Action 

Claims 1.4. 7-8. 1 1 and 14 rejected nnder 3S 0,S.C. 103Ca) hy l.ai et al. <TJSPN 
«.00«.183> referred to as K-ai hereinaflca- in view of Duan e« aJ- OUS Pa«en* l^o. 
«^23,].SO> referred to as Dvkau haralnafter. 

Olaims 5 and 12-13 are r«£|«o«ed nnder 35 U.S.C. 103Ca> as toeinft «np»tentahle over X-ai in 
^ow of Haddock e* al. O-JS^N S^«fS^14> referred to as XSaddoefc: hereinafter. 



currently amended. Support for the amendments in claims I 



Olaizxis 1 and 11 

«^ 1 I are at para«ra,>hs [OOl VJ, (0021]. C0022J, L0024j, C00251, C002VJ. and 

C002B3. 



The office action states: **C:ia 



1^4.^ 1 1 and 14 rejeeced undar 35 U.S-C. 



l03Ca> hy JL-ai et aU <USM»Isr ^.00^,1»3> »ferred to as «^ barainsfter in view of l>»ao et 



al. CtrS Patent r»Io- «^2L3,a.50) referred to as 1>«m«b 



applicant-s tnvantion discloses an opUons and parameters «-» component for sextints 



•^^piresentin^ th«: alterrx^ives to the user in one of voice naodo, visual mode, or a 
coml>ination of voice and vis^^ mode, and receiving a selection of an aJremative 
hy the liser ^W.r« thc= pi.^-^li^ pilm"^"*'-^^'^ ir^....«nr^ to rhe n^r in one of voice 
mode, visual mode, or a combination of voice mode and visual mode*' in claim 
11. 

Theref!:>x-e. X_>ai in view of £>uan do<93 not teach or su^^est rhe limitarionA ir» 
amended claims 1 and 11. 

Furthermore- even if Lai and r *^Tfi n oombined &s. sug^^^exct irt |^ig gffiv^ action, 

rh^ cQm^4w,*.*4<r^w^ that resi^^lr^ will not arrives at th«^ ^Inimrrd inventioiv For example, l^oi in 
view of Ouan will b»cr inoperable and unsiACcessfUl for the purpose of arrivinB at the 
sys^nx or corresponding method steps in claim 1 and 1 1 res pectively. I>ecaui e e ZL^ai in 
view of Ounn does not teach: 

"an fftrrAf ^fi rio"^ par*> Tf^*^ <or oontraUin^ the speech disambiguation 



- **a speech, recognition component that .. senarates one or more tokens 

corresponding to the speech input without *^'^^t***i^^ ^^th^s lan^>>ai;>c^ in 
whjch t *T r fW T^^ o- meech_ or a e.ambinntn'gn of «y»~ch and audio innut is 
i:seceivg »^ ^aM«enf-T TniTf MrKmr*** ^v^tw: r^^ thti 

"onc or more disambiguation components that present TllIT rHrfmr***'*^^" 
user in one of voice mode, visual mode, or a combination of voice mode <wTd 
•^iTMnl BiirmfT"! and 

•* reoeiv ga j^ir-nr^^^vgi selected by the user ^^om *-*i f r a M T Ji T M tV Aitera^tiv«a 
nresent rrr* TiP t**'^ xmxs^ in one of voice mode, visual mode, or n yKTmMnatiO" 

voice n»^^*7-. YlfP*!^^ ff"*^^^"- 

Furthermo.e- anplicant x^«n«etfhllv submits thwt the Lai and O^ifin rgfeTglMrga thftt ttT^ 
sought to be combined are in xmc>-r^-«»"^»^ff Applicant's invention is a j^peech 
«co«miti<>T> «v*Bt«^ where tokens are generated based on recognition of the speech uttered 
by the user in a **ir>o>i^ l^"'"^*-" by a speech cMOftnizer (see pa r-agap h COOl 1 J>. In 



rescntea to the user in on« oF voice moa«. visual mode, or a oombination of" voice moao 
and visual mode;**. 

In swinmry. Lai in view oifDuan dbes not teacti the following lioiitatioM in <zl^m 
111 

"an options and parameters component for receiving user parameters and 
T,p pH««trion na^m«tr^ f» «- c«>ntrolHntt the St>cech di^mmbf FITnti™ mCS^hmigP^. 



parameters requii^ for eontzollins the disambiguation mechanism by 

applicAtfoi <see parafiraphs [0017]. C0021D. C0022]. [002^]; [0025]; and Fift 1; paragraph 
|:0024) recites: "The end user 108 and the application 106 can both set parameters 1 1 4 to 
control the sub-components of the MOM** >. The limitation: '^receiving parameters from a 
user and the application for controlling the speech disambiguation mechanisn^ wherein 
both the user and the appUcation can set the parameters to control said mechanism. in 
claim 1 and 11 is not found in 11^ or Ouan. In contrast to the setting of parameters by 
both the user and the appUcation to control the speech disambiguation mechanism in 
applicant's, invention, ^^i*t^ or Puan tear^h fltrttiiffig Of nnTffmff?tirrr7i bv t^^ p>r>r>ii<^tion > 
Lai teaches assigning a confidence level score by a confidence level scorer ZOO of the 
speech engine 1€0 <Lai. col. 3, lines 29-30>; enabling a user of the system to select score 
thresholds CLai, col. 3, lines 3T-42>; and aUowing the user appUcation to accept 
information &om Uie user control <Lai. eol- 4. lines 11-1 5). 



contra3t. Ouan is a If^^f^^^f-f- translatio n system where tokens are generated based on 
translation o f a x»ser»s utterance fVr.rw^ ^ ^x^ce langua</e to a t^r^^t langtiafte Csee Ouan, 
col. 21. lines 9-13; and col. 9, lines 55-<SS>. The method by which tokens are generated by 
a speech nscognition system for speech uttered in a single language is vastly different and 
distinguishable from the method in Ouan where tokens are generated by a language 
translation system that txunslates a user's utterance firom a source language to a target 
language. A person of ordinary skill iti die art woul<J not likely look at a language traxislation 
system to find how to fina how a pluralityr of aitemarive words can be generated and presented to 
a user, for use In a speech raoogniUon system fbr the reason stated above. Therefore, applicant 
xespectailly submits that the teachings of Lai and 0%ian may not be combined. 

Olaims 4. 7-« are dependent on claim I and further claim 14 is dependent on 
claim 1 1 . Since 1-ai and Duan does not teach, suggest, or motivate the limitations of claim 
1 and 1 1, applicant z«^>ectfu]ly submits claims 4.. 7-8, and 14 also to be novel over Lai 
and Ouan. The applicant solicits reconsideration and allowance of claims 1,4. 7-8, 11. 
and 14. 

The o£Qce action states: **01aims 5 and 12-13 are rejected under 35 U.S. O. I03(a) as 
being unpatentable over X..ai in view of Hadldocsk c« aL CUSPIST A,2«3,014> referred (o as 
Hsddoek hereinaAer**. 

Hven if Lai and I-Zaddock are combined as suggested by the Examiner, the 
combination that results will be Inoperable for the purpose intended by claim 1 in 
applicant's invention, i.e.. *'an options and parameters component ibr receiving user 
parameters and application parameters for controlling the speech disambiguation 
mechanism, wherein both the user and the application can set the parameters to control 
said mechanism, and wheiein tbe parameters IncUide confidence tliresholds governing 
unambieruous recognition and close matches*^. In applicant's invention, the parameters are 
set by both the uscsr and the application. In contrast* neither Lai and Haddock teach or 
suggest setting of the parameters by both the user and the cq>p]ication. 



lO 



xesezitod to <*»o \»sor in ono of 'voico anode, visuaJ modo, or a ooxxil? inAti on of' voice modcr 
and visual fnodop**. 

Zn siuTBjcnary, JL^oi in "view of Ouan docss not tcaciH the' £o]lowixiA limiiationa in claim 
1 and II: 

*''an options and faarameters coxnponant for iroooivin^ user f>aramet<»'8 and 
appiica.tlon parameters fox- oontroiling tha sp«aaoh di^^p fn ^ jjenr r*^**'^ "TK^ '^fi niffTTI . 
wt^er^in <yotla the nwar and the aooMoation can sc=< tKa neu-qjfngrtez^ to oontrol said 
mssibaaiSIIL. in clairo 1. and 

**a selection ooxnj>on/emt that identifiea, aocoxdin^ to a selection aJf^orithm. whieti 
two or more tolcena aeneratcd wtthomt translation of t>.c l»w»^,^*^^ in wKicfa tJ^#B 
BP«?f?ffH, ftMdip <>r <p«MpolyA?ff^ti|iyn Of gpgggh flAI<1^o innurt Is r^ec<»ivad- and mTc-^mis 

rmiil fok^TO aJ*CTi>ftttv^a ti^ tfrr- iwer'* in claim 1 nutil 

"one or more di ptamfaigyiation coxns>Onents that present the alternatives to the user 
in one of voice mode, visual mode, or a co»-r^»-»<'^nfJon of vc>ic>c mode and vi?^ual 

and receive an altexxiative selected by the user fVmii th>e ol M rfl J lBfY O*^ 
°Jlfrr*7^tive3 reacnted TJhl^T yg^*- in one of voice mode, visual mode, or a 
coml>4nation of voice; nntode and visual modg "* in claim 1, and 

"receiving user pax-aLxnecers and application p&rameters for controlling tKe speech 
disaxnl>i^uation mechkanisna, wherein hoth the user and the application can &et the 
parameters to control said mechanism, and wherein the paranactera include 
confidence thresholds fiovemins unambiguous recognition and close matches" in . 
claim 11- 

*-'-&eleotin0 two or more tolcena generated without translation of the lan^ua^c in 
which the speech, audio or combination of speech and audio input is received, and 
presenting said tolcena as alternatives to the user*' in claim. 11, and 



TThe applicant's system for disarobigtuating spccoli input comprises, in parx, of a 
disamhiguation component that presents two or more alternatives ro a usei- in voice xxiode* 
visual mode, or a combination of voice mode and visvtal mode, and receives an 
alternative selected by the user in voice n:LOde, visual mode, or a oombination of voice 
mode and visual mode Csee- paragraph [00283>- Mowever, neither Lai nor Maddock teacH 
selection of the alternatives by the user using a combination of voice mode and visual 
mode. As stated on page 2 and 3, items i. to v. of Che o£E3ce action and Figure 2, l-<ai 
Suggests that an acoustic signal may be inputted to the speecb vecognizer 1.90 and the 
system may display words with confldenoe level indicated. However. L^ai does not 
expressly teach that the prefercnc es are provided to the user to select in voice mode or 
visual mode or a combination of voice mode and visual mode, but instead suggests that 
the input to the speech recognizer is in voice mode and preferences are provided to 

the user in visual mode. Hence, there is no teaching, suggestion or motivation in Lai and 
HaddocJc of the fbitowing limitations recited in applicant's claixns 12 and 13; 

"'Wherein the disambiguation coxnponents present the alternatives to the user in a 
visual form and allow the user to select &om among the alternatives using a voice 
input*' of ciaitn 

** where the interaction comprises the concurrent use of said visual mode and said 
voice mode'^ of claim 12, and 

^'wlierein the interaction comprises the user selecting :&om among the plural 
alternatives using a combination of sp»eech and visual-based input** of claim I 3. 

The disambiguation coxnponents such as the output generator and the input 
handler in tho present invention (see pckragrapihs [0027] and C0028J> allow multimodal 
interaction induuding voice mode interaction, visual mode interaction, or a combination of 
voice mode and visual mode for a user with the disan&biguation mechanism tor the 
purposes of disam bigi i wring speech input in case of an ambiguous speech input 
recognition. Olaims S. 12. and 13 <by virtue of thair dependence on claims 1 and 11) 
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recite different instances of multimodal interaction of the user with the system for 
disambiguating speech input using the disambiguation components. 

Common sense dictates that a person of ordinary skill in the art, at the time the 
invention was miade, would not combine the method of indicating the level of confidence 
the system has in its speech recognition as described in Lai and the method for 
disambiguating natural language queries using referential input by a user as described in 
Haddock, to arrive at the claimed invention because Lai and/or Haddock show no 
recognition or appreciation of the following limitations recited in claims 5^ 12, and 13 : 

**v^iierein the disambiguation components present the alternatives to the user in a 
visual form and allow the user to select fiom among the alternatives using a voice 
input" of claim 5, 

^^vhere the interaction comprises the concurrent use of said visual mode and said 
voice mode" of claim 12, and 

'N^erein the interaction comprises the user selecting fiom among the plural 
alternatives using a combination of speech and visual-based input" of claim 13. 

Furthomore, a secondary consideration of non-obviousness of a{^licant*s 
invention is the option provided to the user for selecting the correct uttered word from a 
plurality of alternate words, if the speech disambiguation system fails to recognize the 
correct uttered word. Moreover, the speech disambiguation system ^lables the user to 
select the coirect word in visual mode, voice mode or a combination of voice and visual 
mode. Since the applicant's invention offers wider choice and flexibility to the user for 
selecting the coirect uttered word v/idle using a speech disambiguation system, it is more 
likely to be conmiercially successful. 

Another secondary consideration of non-obviousness of applicant's invention is 
that the existing art enables a us^ to set initial parameters for automatic speech 
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recognition based on user preferences. However, die existing art fails to provide an 
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option to set parameters based on the mdai^lication and requires resetti^ ^ ^ 

parameters each time based on the application. Hence, there is a long feU but unsolved 
need for an automatic speech recognition system that is mitialized based on the user 
preferences as well as Ifae end application. 

In contrast to the above indicia of non-obviousness, Duan, Lai» and Haddock fail 
to suggest or implement a method of automatic speech recognition with parameter setting 
based on user preferences and the end application* 

For the reasons stated above, applicant respectfully submits that claims 5, 12, and 
13 are not obvious over the cited references, and applicant soUcits reconsideration of the 
rejection and allowance of claims 5, 12, and 13. 

Conclusion 

Applicant respectfully requests that a timely Notice of Allowance be issued in this 
case. in the opinion of Examiner Rider a telephone conference would expedite the 
prosecution of this application. Examiner Rider is requested to call the undersigned. 

Respectfully submitted. 



Date: July 21, 2008 Ashok Tankha, Esq. 

Attorney For Applicant 
Reg, No. 33,802 
Phone:856-266-5145 

Correspondence Address 
36 Greenleigh Drive 
SewellNJ 08080 
Fax: 856-374-0246 
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